Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use realloc for histogram cache and expose the cache limit. #9455

Merged
merged 1 commit into from
Aug 10, 2023

Conversation

trivialfis
Copy link
Member

@trivialfis trivialfis commented Aug 9, 2023

Following are plots with 4 local workers from LocalCluster. The initial spike in the plot is not accurate. Please note that memory usage with LocalCluster does not represent real world usage. The histogram cache is built for each worker.

The memory spike observed in #9452 is caused by std::vector::resize. I'm not sure which one is actually better for a longer term.

The memory usage can not be directly compared to 1.7.x as we have a new estimation method for intercept. To disable it, set the base_score to 0.5 (default in 1.7).

I have exposed the limit as a parameter to the user interface so that there's at least an option when things get dire.

Close #9452 .

  • Default configuration of max cache:

figure-16

  • Cache size limited to 4096:

figure-12

@trivialfis trivialfis merged commit 1caa932 into dmlc:master Aug 10, 2023
21 checks passed
@trivialfis trivialfis deleted the hist-cache-realloc branch August 10, 2023 06:05
@trivialfis
Copy link
Member Author

BTW, I did some benchmark, the performance is unchanged.

@trivialfis
Copy link
Member Author

cc @WeichenXu123 for awareness.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Large memory spike after introduction of histogram size bound
2 participants